Model Selection

High-resolution Processing

# High-resolution Processing

Auramask Ensemble Moon

This model employs an improved VNet architecture for 2D image processing, focusing on image-to-image translation tasks with adversarial and aesthetic optimization features.

Image Generation

C-RADIOv2 is a visual feature extraction model developed by NVIDIA, offering multiple specification versions suitable for image understanding and dense processing tasks.

Vit So400m Patch14 Siglip Gap 384.webli

Vision Transformer model based on SigLIP, utilizing global average pooling for image features

Image Classification

Vit Base Patch16 Siglip 512.webli

Vision Transformer model based on SigLIP architecture, containing only the image encoder part, using original attention pooling mechanism

Image Classification

Vit Base Patch16 Siglip 256.webli I18n

ViT-B-16 vision Transformer model based on SigLIP, containing only the image encoder, utilizing raw attention pooling

Image Classification

Convnext Large Mlp.clip Laion2b Ft Soup 320

ConvNeXt-Large image encoder based on CLIP architecture, fine-tuned on the LAION-2B dataset, supporting 320x320 resolution image feature extraction

Image Classification

Dust3r ViTLarge BaseDecoder 512 Dpt

DUSt3R is a model for easily achieving geometric 3D vision from images, capable of reconstructing 3D scenes from single or multiple images.

Large-scale vision-language model based on Vision Transformer architecture, supporting zero-shot image classification tasks

Image Classification

This model is a fine-tuned version based on Facebook's ConvNeXtV2 architecture, specifically trained for multi-label classification tasks on Pixiv ranking images

Image Classification

Eva02 Enormous Patch14 Clip 224.laion2b S4b B115k

Large-scale vision-language model based on EVA02 architecture, supporting zero-shot image classification tasks

Eva02 Large Patch14 Clip 336.merged2b S6b B61k

EVA02 is a large-scale vision-language model based on the CLIP architecture, supporting zero-shot image classification tasks.

Vit Large Patch16 224

Large-scale image classification model based on Transformer architecture, pre-trained and fine-tuned on ImageNet-21k and ImageNet-1k datasets

Image Classification

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase